72 research outputs found

    The Wikipedia Image Retrieval Task

    Get PDF
    The wikipedia image retrieval task at ImageCLEF provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate the effectiveness of retrieval approaches that exploit textual and visual evidence in the context of a large and heterogeneous collection of images that are searched for by users with diverse information needs. This chapter presents an overview of the available test collections, summarises the retrieval approaches employed by the groups that participated in the task during the 2008 and 2009 ImageCLEF campaigns, provides an analysis of the main evaluation results, identifies best practices for effective retrieval, and discusses open issues

    Overview of the wikipediaMM task at ImageCLEF 2008

    Get PDF
    The wikipediaMM task provides a testbed for the system-oriented evaluation of ad-hoc retrieval from a large collection of Wikipedia images. It became a part of the ImageCLEF evaluation campaign in 2008 with the aim of investigating the use of visual and textual sources in combination for improving the retrieval performance. This paper presents an overview of the task¿s resources, topics, assessments, participants' approaches, and main results

    Hybrid focused crawling on the Surface and the Dark Web

    Get PDF
    Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates 11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed focused crawler both for the Surface and the Dark Web

    Reliability and effectiveness of clickthrough data for automatic image annotation

    Get PDF
    Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manual annotation effort. We investigate and evaluate this approach using a collection of 97,628 photographic images. The results indicate that the contribution of search log based training data is positive despite their inherent noise; in particular, the combination of manual and automatically generated training data outperforms the use of manual data alone. It is therefore possible to use clickthrough data to perform large-scale image annotation with little manual annotation effort or, depending on performance, using only the automatically generated training data. An extensive presentation of the experimental results and the accompanying data can be accessed at http://olympus.ee.auth.gr/~diou/civr2009/

    VITALAS at TRECVID-2008

    Get PDF
    In this paper, we present our experiments in TRECVID 2008 about High-Level feature extraction task. This is the first year for our participation in TRECVID, our system adopts some popular approaches that other workgroups proposed before. We proposed 2 advanced low-level features NEW Gabor texture descriptor and the Compact-SIFT Codeword histogram. Our system applied well-known LIBSVM to train the SVM classifier for the basic classifier. In fusion step, some methods were employed such as the Voting, SVM-base, HCRF and Bootstrap Average AdaBoost(BAAB)

    Measuring and Analyzing the Scholarly Impact of Experimental Evaluation Initiatives

    Get PDF
    AbstractEvaluation initiatives have been widely credited with contributing highly to the development and advancement of information access systems, by providing a sustainable platform for conducting the very demanding activity of comparable experimental evaluation in a large scale. Measuring the impact of such benchmarking activities is crucial for assessing which of their aspects have been successful, which activities should be continued, enforced or suspended and which research paths should be further pursued in the future. This work introduces a framework for modeling the data produced by evaluation campaigns, a methodology for measuring their scholarly impact, and tools exploiting visual analytics to analyze the outcomes

    Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution

    Get PDF
    The (asymptotic) degree distributions of the best-known “scale-free” network models are all similar and are independent of the seed graph used; hence, it has been tempting to assume that networks generated by these models are generally similar. In this paper, we observe that several key topological features of such networks depend heavily on the specific model and the seed graph used. Furthermore, we show that starting with the “right” seed graph (typically a dense subgraph of the protein–protein interaction network analyzed), the duplication model captures many topological features of publicly available protein–protein interaction networks very well

    Deploying Semantic Web Technologies for Information Fusion of Terrorism-related Content and Threat Detection on the Web

    Get PDF
    The Web and social media nowadays play an increasingly significant role in spreading terrorism-related propaganda and content. In order to deploy counterterrorism measures, authorities rely on automated systems for analysing text, multimedia, and social media content on the Web. However, since each of these systems is an isolated solution, investigators often face the challenge of having to cope with a diverse array of heterogeneous sources and formats that generate vast volumes of data. Semantic Web technologies can alleviate this problem by delivering a toolset of mechanisms for knowledge representation, information fusion, semantic search, and sophisticated analyses of terrorist networks and spatiotemporal information. In the Semantic Web environment, ontologies play a key role by offering a shared, uniform model for semantically integrating information from multimodal heterogeneous sources. An additional benefit is that ontologies can be augmented with powerful tools for semantic enrichment and reasoning. This paper presents such a unified semantic infrastructure for information fusion of terrorism-related content and threat detection on theWeb. The framework is deployed within the TENSOR EU-funded project, and consists of an ontology and an adaptable semantic reasoning mechanism. We strongly believe that, in the short- and long-term, these techniques can greatly assist Law Enforcement Agencies in their investigational operations

    MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities

    Full text link
    In this paper, we introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities. The dataset is designed for researchers and developers who build applications that perform multiple tasks on data encountered on the web and in digital archives. A second version of MLM provides a geo-representative subset of the data with weighted samples for countries of the European Union. We demonstrate the value of the resource in developing novel applications in the digital humanities with a motivating use case and specify a benchmark set of tasks to retrieve modalities and locate entities in the dataset. Evaluation of baseline multitask and single task systems on the full and geo-representative versions of MLM demonstrate the challenges of generalising on diverse data. In addition to the digital humanities, we expect the resource to contribute to research in multimodal representation learning, location estimation, and scene understanding
    corecore